]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years ago.gitignore: Add m4 macro directories to ignore list
Gary Lowell [Thu, 6 Dec 2012 03:39:11 +0000 (19:39 -0800)]
.gitignore:  Add m4 macro directories to ignore list

12 years agobuild: Add RPM release string generated from git describe.
Gary Lowell [Thu, 8 Nov 2012 20:43:24 +0000 (12:43 -0800)]
build:  Add RPM release string generated from git describe.

Fix for bug 3451.  Use the commit count and sha1 from git describe to
construct a release string for rpm packages.

Conflicts:

configure.ac

12 years agoceph.spec.in: Build debuginfo subpackage.
Gary Lowell [Fri, 9 Nov 2012 21:28:13 +0000 (13:28 -0800)]
ceph.spec.in:  Build debuginfo subpackage.

This is a partial fix for bug 3471.  Enable building of debuginfo package.
Some distributions enable this automatically by installing additional rpm
macros, on others it needs to be explicity added to the spec file.

12 years agorgw: fix swift auth concurrency issue
Yehuda Sadeh [Mon, 3 Dec 2012 22:32:28 +0000 (14:32 -0800)]
rgw: fix swift auth concurrency issue

Fixes: #3565
Originally ops were using static structures, but that
has since changed. Switching swift auth handler to do
the same.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: fix rgw_tools get_obj()
Yehuda Sadeh [Thu, 29 Nov 2012 21:39:22 +0000 (13:39 -0800)]
rgw: fix rgw_tools get_obj()

The original implementation broke whenever data exceeded
the chunk size. Also don't keep cache for objects that
exceed the chunk size as cache is not designed for
it. Increased chunk size to 512k.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: fix PUT acls
Yehuda Sadeh [Thu, 29 Nov 2012 20:47:59 +0000 (12:47 -0800)]
rgw: fix PUT acls

This fixes a regression introduced at
17e4c0df44781f5ff1d74f3800722452b6a0fc58. The original
patch fixed error leak, however it also removed the
operation's send_response() call.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: fix xml parser leak
Yehuda Sadeh [Tue, 20 Nov 2012 01:10:11 +0000 (17:10 -0800)]
rgw: fix xml parser leak

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit f86522cdfcd81b2d28c581ac8b8de6226bc8d1a4)

12 years agorgw: fix memory leaks
Yehuda Sadeh [Tue, 20 Nov 2012 00:52:38 +0000 (16:52 -0800)]
rgw: fix memory leaks

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 98a04d76ebffa61c3ba4b033cdd57ac57b2f29f3)

Conflicts:
src/rgw/rgw_op.cc
src/rgw/rgw_op.h

12 years agorgw: don't convert object mtime to UTC
Yehuda Sadeh [Wed, 7 Nov 2012 21:21:15 +0000 (13:21 -0800)]
rgw: don't convert object mtime to UTC

Fixes: #3452
When we read object info, don't try to convert mtime to
UTC, it's already in UTC.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: relax date format check
Yehuda Sadeh [Wed, 14 Nov 2012 19:30:34 +0000 (11:30 -0800)]
rgw: relax date format check

Don't try to parse beyond the GMT or UTC. Some clients use
special date formatting. If we end up misparsing the date
it'll fail in the authorization, so don't need to be too
restrictive.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoceph-disk-activate: avoid duplicating mounts if already activated
Sage Weil [Tue, 30 Oct 2012 21:17:56 +0000 (14:17 -0700)]
ceph-disk-activate: avoid duplicating mounts if already activated

If the given device is already mounted at the target location, do not
mount --move it again and create a bunch of dup entries in the /etc/mtab
and kernel mount table.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c435d314caeb5424c1f4482ad02f8a085317ad5b)

12 years agoceph-disk-prepare: poke kernel into refreshing partition tables
Sage Weil [Fri, 26 Oct 2012 04:21:18 +0000 (21:21 -0700)]
ceph-disk-prepare: poke kernel into refreshing partition tables

Prod the kernel to refresh the partition table after we create one.  The
partprobe program is packaged with parted, which we already use, so this
introduces no new dependency.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 402e1f5319a52c309eca936081fddede1f107268)

12 years agoceph-disk-prepare: fix journal partition creation
Sage Weil [Fri, 26 Oct 2012 04:20:21 +0000 (21:20 -0700)]
ceph-disk-prepare: fix journal partition creation

The end value needs to have + to indicate it is relative to wherever the
start is.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2e32a0ee2d9e2a3bf5b138f50efc5fba8d5b8660)

12 years agoceph-disk-prepare: assume parted failure means no partition table
Sage Weil [Fri, 26 Oct 2012 01:14:47 +0000 (18:14 -0700)]
ceph-disk-prepare: assume parted failure means no partition table

If the disk has no valid label we get an error like

  Error: /dev/sdi: unrecognised disk label

Assume any error we get is that and go with an id label of 1.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8921fc7c7bc28fb98334c06f1f0c10af58085085)

12 years agoMerge remote-tracking branch 'gh/wip-mds-stable' into stable
Sage Weil [Mon, 12 Nov 2012 19:24:00 +0000 (11:24 -0800)]
Merge remote-tracking branch 'gh/wip-mds-stable' into stable

12 years agomds: re-try_set_loner() after doing evals in eval(CInode*, int mask)
Sage Weil [Fri, 9 Nov 2012 13:28:12 +0000 (05:28 -0800)]
mds: re-try_set_loner() after doing evals in eval(CInode*, int mask)

Consider a case where current loner is A and wanted loner is B.
At the top of the function we try to set the loner, but that may fail
because we haven't processed the gathered caps yet for the previous
loner.  In the body we do that and potentially drop the old loner, but we
do not try_set_loner() again on the desired loner.

Try after our drop.  If it succeeds, loop through the eval's one more time
so that we can issue caps approriately.

This fixes a hang induced by a simple loop like:

 while true ; do echo asdf >> mnt.a/foo ; tail mnt.b/foo ; done &
 while true ; do ls mnt.a mnt.b ; done

(The second loop may not be necessary.)

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoCompatSet: users pass bit indices rather than masks
Samuel Just [Fri, 13 Jul 2012 21:23:27 +0000 (14:23 -0700)]
CompatSet: users pass bit indices rather than masks

CompatSet users number the Feature objects rather than
providing masks.  Thus, we should do

mask |= (1 << f.id) rather than mask |= f.id.

In order to detect old, broken encodings, the lowest
bit will be set in memory but not set in the encoding.
We can reconstruct the correct mask from the names map.

This bug can cause an incompat bit to not be detected
since 1|2 == 1|2|3.

fixes: #2748

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoceph.spec.in: Remove ceph version requirement from ceph-fuse package.
Gary Lowell [Wed, 7 Nov 2012 20:41:10 +0000 (12:41 -0800)]
ceph.spec.in:  Remove ceph version requirement from ceph-fuse package.

The ceph-fuse rpm package now only requires ceph as a pre-req, not a specific
version.

12 years agorgw: fix multipart overwrite
Yehuda Sadeh [Wed, 24 Oct 2012 20:15:46 +0000 (13:15 -0700)]
rgw: fix multipart overwrite

Fixes: #3400
Removed a few lines of code that prematurely created the head
part of the final object (before creating the manifest).

backport:argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agomds: move to from loner -> mix if *anyone* wants rd|wr
Sage Weil [Tue, 6 Nov 2012 07:27:13 +0000 (23:27 -0800)]
mds: move to from loner -> mix if *anyone* wants rd|wr

We were either going to MIX or SYNC depending on whether non-loners wanted
to read/write, but it may be that the loner wants to if our logic for
choosing loner vs not longer is based on anything other that just rd|wr
wanted.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: base loner decision on wanted RD|WR|EXCL, not CACHE|BUFFER
Sage Weil [Tue, 6 Nov 2012 07:26:09 +0000 (23:26 -0800)]
mds: base loner decision on wanted RD|WR|EXCL, not CACHE|BUFFER

Observed instance where one client wanted the Fc cap and prevented the
loner from getting RD|WR caps.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make pool_snap_info_t encoding backward compatible
Sage Weil [Tue, 30 Oct 2012 16:00:11 +0000 (09:00 -0700)]
osd: make pool_snap_info_t encoding backward compatible

Way back in fc869dee1e8a1c90c93cb7e678563772fb1c51fb (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients.  In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.

Fixes: #3290
Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:

src/test/encoding/types.h

12 years agoosd/OSD.cc: Fix typo in OSD::heartbeat_check()
Yan, Zheng [Fri, 7 Sep 2012 05:49:27 +0000 (13:49 +0800)]
osd/OSD.cc: Fix typo in OSD::heartbeat_check()

The check 'p->second.last_tx > cutoff' should always be false
since last_tx is periodically updated by OSD::heartbeat()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorgw: dump an error message if FCGX_Accept fails
Yehuda Sadeh [Mon, 22 Oct 2012 23:52:11 +0000 (16:52 -0700)]
rgw: dump an error message if FCGX_Accept fails

Adding missing debug info.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoworkqueue: make debug output include active threads
Sage Weil [Mon, 22 Oct 2012 22:38:30 +0000 (15:38 -0700)]
workqueue: make debug output include active threads

Include active thread count in threadpool debug output.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorgw: don't continue processing of GET request on error
Yehuda Sadeh [Mon, 22 Oct 2012 20:16:59 +0000 (13:16 -0700)]
rgw: don't continue processing of GET request on error

Fixes #3381
We continued processing requests long after the client
has died. This fix appliese to both s3 and swift.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoosd: be quiet about watches
Sage Weil [Fri, 19 Oct 2012 15:46:19 +0000 (08:46 -0700)]
osd: be quiet about watches

Useless log noise.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoaddr_parsing: make , and ; and ' ' all delimiters
Sage Weil [Thu, 18 Oct 2012 00:44:12 +0000 (17:44 -0700)]
addr_parsing: make , and ; and ' ' all delimiters

Instead of just ,.  Currently "foo.com, bar.com" will fail because of the
space after the comma.  This patches fixes that, and makes all delim
chars interchangeable.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph-disk-prepare, debian/control: Support external journals.
Tommi Virtanen [Fri, 5 Oct 2012 17:57:42 +0000 (10:57 -0700)]
ceph-disk-prepare, debian/control: Support external journals.

Previously, ceph-disk-* would only let you use a journal that was a
file inside the OSD data directory. With this, you can do:

  ceph-disk-prepare /dev/sdb /dev/sdb

to put the journal as a second partition on the same disk as the OSD
data (might save some file system overhead), or, more interestingly:

  ceph-disk-prepare /dev/sdb /dev/sdc

which makes it create a new partition on /dev/sdc to use as the
journal. Size of the partition is decided by $osd_journal_size.
/dev/sdc must be a GPT-format disk. Multiple OSDs may share the same
journal disk (using separate partitions); this way, a single fast SSD
can serve as journal for multiple spinning disks.

The second use case currently requires parted, so a Recommends: for
parted has been added to Debian packaging.

Closes: #3078
Closes: #3079
Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agorgw: don't add port to url if already has one
Yehuda Sadeh [Mon, 15 Oct 2012 16:43:47 +0000 (09:43 -0700)]
rgw: don't add port to url if already has one

Fixes: #3296
Specifically, is host name string already has ':', then
don't try to append theport (swift auth).

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoadmin_socket: fix '0' protocol version
Sage Weil [Mon, 15 Oct 2012 23:37:05 +0000 (16:37 -0700)]
admin_socket: fix '0' protocol version

Broken by 895e24d198ced83ab7fed3725f12f75e3bc97b0b.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: drop command replies on paxos reset
Sage Weil [Tue, 9 Oct 2012 00:14:22 +0000 (17:14 -0700)]
mon: drop command replies on paxos reset

If paxos resets, do not send the reply for the commit we were waiting for;
let the command be reprocessed and re-proposed.

Among other things, this could lead to nondeterministic results for
'ceph osd create <uuid>'.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/for-stable-fstypes-and-ext-journal' into stable
Sage Weil [Tue, 9 Oct 2012 04:02:51 +0000 (21:02 -0700)]
Merge remote-tracking branch 'gh/for-stable-fstypes-and-ext-journal' into stable

12 years agoceph-authtool: Fix usage, it's --print-key not --print.
Tommi Virtanen [Thu, 2 Aug 2012 20:02:04 +0000 (13:02 -0700)]
ceph-authtool: Fix usage, it's --print-key not --print.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoupstart: OSD journal can be a symlink; if it's dangling, don't start.
Tommi Virtanen [Fri, 5 Oct 2012 16:22:34 +0000 (09:22 -0700)]
upstart: OSD journal can be a symlink; if it's dangling, don't start.

This lets a $osd_data/journal symlink point to
/dev/disk/by-partuuid/UUID and the osd will not attempt to start until
that disk is available.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoosd: Make --get-journal-fsid not really start the osd.
Sage Weil [Fri, 5 Oct 2012 16:10:31 +0000 (09:10 -0700)]
osd: Make --get-journal-fsid not really start the osd.

This way, it won't need -i ID and it won't access the osd_data_dir.
That makes it useful for locating the right osd to use with an
external journal partition.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoosd: Make --get-journal-fsid not attempt aio or direct_io.
Tommi Virtanen [Fri, 5 Oct 2012 16:08:56 +0000 (09:08 -0700)]
osd: Make --get-journal-fsid not attempt aio or direct_io.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-prepare: Use the OSD uuid as the partition GUID.
Tommi Virtanen [Thu, 4 Oct 2012 23:03:40 +0000 (16:03 -0700)]
ceph-disk-prepare: Use the OSD uuid as the partition GUID.

This will make locating the right data partition for a given journal
partition a lot easier.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agodebian/control, ceph-disk-prepare: Depend on xfsprogs, use xfs by default.
Tommi Virtanen [Wed, 3 Oct 2012 19:38:38 +0000 (12:38 -0700)]
debian/control, ceph-disk-prepare: Depend on xfsprogs, use xfs by default.

Ext4 as a default is a bad choice, as we don't perform enough QA with
it. To use XFS as the default for ceph-disk-prepare, we need to depend
on xfsprogs.

btrfs-tools is already recommended, so no change there. If you set
osd_fs_type=btrfs, and don't have the package installed, you'll just
get an error message.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-{prepare,activate}: Default mkfs arguments and mount options.
Tommi Virtanen [Wed, 3 Oct 2012 17:13:17 +0000 (10:13 -0700)]
ceph-disk-{prepare,activate}: Default mkfs arguments and mount options.

The values for the settings were copied from teuthology task "ceph".

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-prepare: Avoid triggering activate before prepare is done.
Tommi Virtanen [Wed, 3 Oct 2012 15:47:20 +0000 (08:47 -0700)]
ceph-disk-prepare: Avoid triggering activate before prepare is done.

Earlier testing never saw this, but now a mount of a disk triggers a
udev blockdev-added event, causing ceph-disk-activate to run even
before ceph-disk-prepare has had a chance to write the files and
unmount the disk.

Avoid this by using a temporary partition type uuid ("ceph 2 be"), and
only setting it to the permanent ("ceph osd"). The hotplug event won't
match the type uuid, and thus won't trigger ceph-disk-activate.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-activate: Add a comment about user_xattr being default now.
Tommi Virtanen [Wed, 3 Oct 2012 00:06:11 +0000 (17:06 -0700)]
ceph-disk-activate: Add a comment about user_xattr being default now.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-activate: Use mount options from ceph.conf
Tommi Virtanen [Tue, 2 Oct 2012 23:53:35 +0000 (16:53 -0700)]
ceph-disk-activate: Use mount options from ceph.conf

Always uses default cluster name ("ceph") for now, see
http://tracker.newdream.net/issues/3253

Closes: #2548
Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-activate: Refactor to extract detect_fstype call.
Tommi Virtanen [Tue, 2 Oct 2012 23:43:08 +0000 (16:43 -0700)]
ceph-disk-activate: Refactor to extract detect_fstype call.

This allows us to use the fstype for a config lookup.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-activate: Unmount on errors (if it did the mount).
Tommi Virtanen [Tue, 2 Oct 2012 23:37:07 +0000 (16:37 -0700)]
ceph-disk-activate: Unmount on errors (if it did the mount).

This cleans up the error handling to not leave disks mounted
in /var/lib/ceph/tmp/mnt.* when something fails, e.g. when
the ceph command line tool can't talk to mons.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-prepare: Allow setting mkfs arguments and mount options in ceph.conf
Tommi Virtanen [Tue, 2 Oct 2012 23:23:55 +0000 (16:23 -0700)]
ceph-disk-prepare: Allow setting mkfs arguments and mount options in ceph.conf

Tested with meaningless but easy-to-verify values:

  [global]
  osd_fs_type = xfs
  osd_fs_mkfs_arguments_xfs = -i size=512
  osd_fs_mount_options_xfs = noikeep

ceph-disk-activate does not respect the mount options yet.

Closes: #2549
Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoceph-disk-prepare: Allow specifying fs type to use.
Tommi Virtanen [Tue, 2 Oct 2012 23:04:15 +0000 (16:04 -0700)]
ceph-disk-prepare: Allow specifying fs type to use.

Either use ceph.conf variable osd_fs_type or command line option
--fs-type=

Default is still ext4, as currently nothing guarantees xfsprogs
or btrfs-tools are installed.

Currently both btrfs and xfs seems to trigger a disk hotplug event at
mount time, thus triggering a useless and unwanted ceph-disk-activate
run. This will be worked around in a later commit.

Currently mkfs and mount options cannot be configured.

Bug: #2549
Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agorgw: copy_object should not override ETAG implicitly
Yehuda Sadeh [Wed, 26 Sep 2012 22:43:56 +0000 (15:43 -0700)]
rgw: copy_object should not override ETAG implicitly

When copying an object with new attrs, we still need to
maintain the ETAG.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw: url_decode should allocate extra byte for dest
Yehuda Sadeh [Tue, 25 Sep 2012 01:10:24 +0000 (18:10 -0700)]
rgw: url_decode should allocate extra byte for dest

Was missing extra byte for null termination

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agov0.48.2argonaut v0.48.2argonaut
Sage Weil [Tue, 11 Sep 2012 20:04:50 +0000 (13:04 -0700)]
v0.48.2argonaut

13 years agocls_rgw: if stats drop below zero, set them to zero
Yehuda Sadeh [Tue, 18 Sep 2012 20:45:27 +0000 (13:45 -0700)]
cls_rgw: if stats drop below zero, set them to zero

This complements fix for #3127. This is only a band aid
solution for argonaut, the real solution fixes the original
issue that made this possible.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agocls_rgw: change scoping of suggested changes vars
Yehuda Sadeh [Wed, 12 Sep 2012 23:41:17 +0000 (16:41 -0700)]
cls_rgw: change scoping of suggested changes vars

Fixes: #3127
Bad variable scoping made it so that specific variables
weren't initialized between suggested changes iterations.
This specifically affected a case where in a specific
change we had an updated followed by a remove, and the
remove was on a non-existent key (e.g., was already
removed earlier). We ended up re-substracting the
object stats, as the entry wasn't reset between
the iterations (and we didn't read it because the
key didn't exist).

backport:argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoobjecter: fix osdmap wait
Sage Weil [Tue, 4 Sep 2012 18:29:21 +0000 (11:29 -0700)]
objecter: fix osdmap wait

When we get a pool_op_reply, we find out which osdmap we need to wait for.
The wait_for_new_map() code was feeding that epoch into
maybe_request_map(), which was feeding it to the monitor with the subscribe
request.  However, that epoch is the *start* epoch, not what we want.  Fix
this code to always subscribe to what we have (+1), and ensure we keep
asking for more until we catch up to what we know we should eventually
get.

Bug: #3075
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit e09b26555c6132ffce08b565780a39e4177cbc1c)

13 years agoobjecter: send queued requests when we get first osdmap
Sage Weil [Mon, 27 Aug 2012 14:38:34 +0000 (07:38 -0700)]
objecter: send queued requests when we get first osdmap

If we get our first osdmap and already have requests queued, send them.

Backported from 8d1efd1b829ae50eab7f7f4c07da04e03fce7c45.

Fixes: #3050
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoobjecter: use ordered map<> for tracking tids to preserve order on resend
Sage Weil [Wed, 22 Aug 2012 04:12:33 +0000 (21:12 -0700)]
objecter: use ordered map<> for tracking tids to preserve order on resend

We are using a hash_map<> to map tids to Op*'s.  In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order.  These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.

Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.

Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map().  This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests.  A simpler but slower fix would be to just use map<> instead.

This is one of many bugs contributing to #2947.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 1113a6c56739a56871f01fa13da881dab36a32c4)

13 years agorbd: force all exiting paths through main()/return
Dan Mick [Mon, 20 Aug 2012 22:02:57 +0000 (15:02 -0700)]
rbd: force all exiting paths through main()/return

This properly destroys objects.  In the process, remove usage_exit();
also kill error-handling in set_conf_param (never relevant for rbd.cc,
and if you call it with both pointers NULL, well...)
Also switch to EXIT_FAILURE for consistency.

Backported from fed8aea662bf919f35a5a72e4e2a2a685af2b2ed.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #2948
13 years agorbd: only open the destination pool for import
Josh Durgin [Tue, 18 Sep 2012 16:37:44 +0000 (09:37 -0700)]
rbd: only open the destination pool for import

Otherwise importing into another pool when the default pool, rbd,
doesn't exist results in an error trying to open the rbd pool.

Reported-by: Sébastien Han <han.sebastien@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoceph-disk-activate, upstart: Use "initctl emit" to start OSDs.
Tommi Virtanen [Mon, 17 Sep 2012 15:55:14 +0000 (08:55 -0700)]
ceph-disk-activate, upstart: Use "initctl emit" to start OSDs.

This avoids an error if the daemon was running already, and is
already being done with the other services.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agorbd: make --pool/--image args easier to understand for import
Josh Durgin [Sat, 15 Sep 2012 00:13:57 +0000 (17:13 -0700)]
rbd: make --pool/--image args easier to understand for import

There's no need to set the default pool in set_pool_image_name - this
is done later, in a way that doesn't ignore --pool if --dest-pool
is not specified.

This means --pool and --image can be used with import, just like
the rest of the commands. Without this change, --dest and --dest-pool
had to be used, and --pool would be silently ignored for rbd import.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoceph-create-keys: Create a bootstrap-osd key too.
Tommi Virtanen [Thu, 13 Sep 2012 21:06:04 +0000 (14:06 -0700)]
ceph-create-keys: Create a bootstrap-osd key too.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoceph-create-keys: Refactor to share wait_for_quorum call.
Tommi Virtanen [Thu, 13 Sep 2012 18:34:03 +0000 (11:34 -0700)]
ceph-create-keys: Refactor to share wait_for_quorum call.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoobjecter: fix skipped map handling
Sage Weil [Wed, 12 Sep 2012 18:38:07 +0000 (11:38 -0700)]
objecter: fix skipped map handling

If we skip a map, we want to translate NO_ACTION to NEED_RESEND, but leave
POOL_DNE alone.

Backported from 2a3b7961c021b19a035f8a6cc4fc3cc90f88f367.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agolibrbd, cls_rbd: close snapshot creation race with old format
Josh Durgin [Mon, 30 Jul 2012 22:19:29 +0000 (15:19 -0700)]
librbd, cls_rbd: close snapshot creation race with old format

If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.

Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.

Backport: argonaut
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
13 years agoupstart: Give everything a stop on stanza.
Tommi Virtanen [Tue, 11 Sep 2012 23:31:57 +0000 (16:31 -0700)]
upstart: Give everything a stop on stanza.

These are all tasks, and expected to exit somewhat quickly,
but e.g. ceph-create-keys has a loop where it waits for mon
to reach quorum, so it might still be in that loop when the
machine is shut down.

13 years agoupstart: Start mds,mon,radosgw after a reboot.
Tommi Virtanen [Tue, 11 Sep 2012 23:28:41 +0000 (16:28 -0700)]
upstart: Start mds,mon,radosgw after a reboot.

They had no "start on" stanzas, so they didn't get started earlier.

13 years agoupstart: Add ceph-create-keys.conf to package.
Tommi Virtanen [Tue, 11 Sep 2012 22:31:06 +0000 (15:31 -0700)]
upstart: Add ceph-create-keys.conf to package.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoobsync: if OrdinaryCallingFormat fails, try SubdomainCallingFormat
Sage Weil [Tue, 11 Sep 2012 21:50:53 +0000 (14:50 -0700)]
obsync: if OrdinaryCallingFormat fails, try SubdomainCallingFormat

This blindly tries the Subdomain calling format if the ordinary method
fails.  In particular, this works around buckets that present a
PermanentRedirect message.

See bug #3128.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Matthew Wodrich <matthew.wodrich@dreamhost.com>
13 years agolibrbd: add test for discard of nonexistent objects
Sage Weil [Fri, 17 Aug 2012 23:04:20 +0000 (16:04 -0700)]
librbd: add test for discard of nonexistent objects

This verifies librbd properly handles ENOENT during discard.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agolibrbd: ignore -ENOENT during discard
Josh Durgin [Mon, 10 Sep 2012 20:19:53 +0000 (13:19 -0700)]
librbd: ignore -ENOENT during discard

This is a backport of a3ad98a3eef062e9ed51dd2d1e58c593e12c9703

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoobjectcacher: fix bh leak on discard
Sage Weil [Thu, 16 Aug 2012 01:42:56 +0000 (18:42 -0700)]
objectcacher: fix bh leak on discard

Fixes: #2950
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoupstart, ceph-create-keys: Make client.admin key generation automatic.
Tommi Virtanen [Thu, 30 Aug 2012 14:16:52 +0000 (10:16 -0400)]
upstart, ceph-create-keys: Make client.admin key generation automatic.

This should help simplify Chef etc deployments. Now (when using the
Upstart jobs), when a ceph-mon is started, ceph-create-admin-key is
triggered. If /etc/ceph/$cluster.client.admin.keyring already exists,
it does nothing; otherwise, it waits for ceph-mon to reach quorum, and
then does a "ceph auth get-or-create" to create the key, and writes it
atomically to disk.

The equivalent code can be removed from the Chef cookbook once this is
in.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoconfig: Add a per-name default keyring to front of keyring search path.
Tommi Virtanen [Thu, 30 Aug 2012 14:21:29 +0000 (10:21 -0400)]
config: Add a per-name default keyring to front of keyring search path.

This lets us have e.g. /etc/ceph/ceph.client.admin.keyring that is
owned by root:admin and mode u=rw,g=r,o= without making every non-root
run of the command line tools complain and fail.

This is what the Chef cookbook has been doing for a while already.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoupstart: Make instance jobs export their cluster and id variables.
Tommi Virtanen [Thu, 30 Aug 2012 14:11:09 +0000 (10:11 -0400)]
upstart: Make instance jobs export their cluster and id variables.

This allows other jobs listening to Upstart "started ceph-mon" events
to see what instance started.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoupstart: Make ceph-osd always set the crush location.
Tommi Virtanen [Thu, 12 Jul 2012 17:47:29 +0000 (10:47 -0700)]
upstart: Make ceph-osd always set the crush location.

This used to be conditional on config having osd_crush_location set,
but with that, minimal configuration left the OSD completely out of
the crush map, and prevented the OSD from starting properly.

Note: Ceph does not currently let this mechanism automatically move
hosts to another location in the CRUSH hierarchy. This means if you
let this run with defaults, setting osd_crush_location later will not
take effect. Set up your config file (or Chef environment) fully
before starting the OSDs the first time.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoceph-disk-prepare: Partition and format OSD data disks automatically.
Tommi Virtanen [Tue, 3 Jul 2012 22:24:26 +0000 (15:24 -0700)]
ceph-disk-prepare: Partition and format OSD data disks automatically.

Uses gdisk, as it seems to be the only tool that can automate GPT uuid
changes. Needs to run as root.

Adds Recommends: gdisk to ceph.deb.

Closes: #2547
Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoceph-disk-prepare: Take fsid from config file.
Tommi Virtanen [Tue, 3 Jul 2012 16:22:28 +0000 (09:22 -0700)]
ceph-disk-prepare: Take fsid from config file.

Closes: #2546.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agoupstart: fix regex
Tommi Virtanen [Mon, 25 Jun 2012 22:14:33 +0000 (15:14 -0700)]
upstart: fix regex

Signed-off-by: Tommi Virtanen <tv@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
13 years agorgw: clear usage map before reading usage
Yehuda Sadeh [Tue, 28 Aug 2012 23:17:21 +0000 (16:17 -0700)]
rgw: clear usage map before reading usage

Fixes: #3057
Since we read usage in chunks we need to clear the
usage map before reading the next chunk, otherwise
we're going to aggregate the old data as well.

Backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoDon't package crush header files.
Gary Lowell [Thu, 23 Aug 2012 18:48:50 +0000 (11:48 -0700)]
Don't package crush header files.

13 years agorgw: dump content_range using 64 bit formatters
Yehuda Sadeh [Sat, 18 Aug 2012 00:34:23 +0000 (17:34 -0700)]
rgw: dump content_range using 64 bit formatters

Fixes: #2961
Also make sure that size is 64 bit.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoRevert "rgw: dump content_range using 64 bit formatters"
Sage Weil [Tue, 21 Aug 2012 17:58:38 +0000 (10:58 -0700)]
Revert "rgw: dump content_range using 64 bit formatters"

This reverts commit faf9fa5744b459abc2eda829a48a4e07b9c97a08.

13 years agorgw: dump content_range using 64 bit formatters
Yehuda Sadeh [Sat, 18 Aug 2012 00:34:23 +0000 (17:34 -0700)]
rgw: dump content_range using 64 bit formatters

Fixes: #2961
Also make sure that size is 64 bit.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
13 years agoobsync: add missing package specifier to format_exc
Matthew Wodrich [Wed, 1 Aug 2012 02:13:03 +0000 (19:13 -0700)]
obsync: add missing package specifier to format_exc

Fixes: #2873
Signed-off-by: Matthew Wodrich <matthew.wodrich@dreamhost.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
13 years agofix keyring generation for mds and osd
Danny Kukawka [Thu, 16 Aug 2012 10:56:58 +0000 (12:56 +0200)]
fix keyring generation for mds and osd

    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ANSI_X3.4-1968" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Fix config keys for OSD/MDS data dirs. As in documentation and other
places of the scripts the keys are 'osd data'/'mds data' and not
'osd_data'

In case if MDS: if 'mds data' doesn't exist, create it.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
13 years agofix ceph osd create help
Danny Kukawka [Thu, 16 Aug 2012 10:56:32 +0000 (12:56 +0200)]
fix ceph osd create help

    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ANSI_X3.4-1968" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Change ceph osd create <osd-id> to ceph osd create <uuid>, since this
is what the command is really doing.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
13 years agomon: simplify logmonitor check_subs; less noise
Sage Weil [Tue, 10 Jul 2012 00:24:19 +0000 (17:24 -0700)]
mon: simplify logmonitor check_subs; less noise

 * simple helper to translate name to id
 * verify sub type is valid in caller
 * assert sub type is valid in method
 * simplify iterator usage

Among other things, this gets rid of this noise in the logs:

2012-07-10 20:51:42.617152 7facb23f1700  1 mon.a@1(peon).log v310 check_sub sub monmap not log type

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agov0.48.1argonaut v0.48.1argonaut
Sage Weil [Mon, 13 Aug 2012 21:58:51 +0000 (14:58 -0700)]
v0.48.1argonaut

13 years agorgw: fix usage trim call encoding
Yehuda Sadeh [Wed, 1 Aug 2012 20:22:38 +0000 (13:22 -0700)]
rgw: fix usage trim call encoding

Fixes: #2841.
Usage trim operation was encoding the wrong op structure (usage read).
Since the structures somewhat overlapped it somewhat worked, but user
info wasn't encoded.

Backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agocls_rgw: fix rgw_cls_usage_log_trim_op encode/decode
Yehuda Sadeh [Wed, 8 Aug 2012 22:21:53 +0000 (15:21 -0700)]
cls_rgw: fix rgw_cls_usage_log_trim_op encode/decode

It was not encoding user, adding that and reset version
compatibility.
This changes affects command interface, makes use of
radosgw-admin usage trim incompatible. Use of old
radosgw-admin usage trim should be avoided, as it may
remove more data than requested. In any case, upgraded
server code will not handle old client's trim requests.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw: expand date format support
Yehuda Sadeh [Tue, 31 Jul 2012 23:17:22 +0000 (16:17 -0700)]
rgw: expand date format support

Relaxing the date format parsing function to allow UTC
instead of GMT.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw: complete multipart upload can handle chunked encoding
Yehuda Sadeh [Thu, 2 Aug 2012 18:13:05 +0000 (11:13 -0700)]
rgw: complete multipart upload can handle chunked encoding

Fixes: #2878
We now allow complete multipart upload to use chunked encoding
when sending request data. With chunked encoding the HTTP_LENGTH
header is not required.

Backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw_xml: xml_handle_data() appends data string
Yehuda Sadeh [Wed, 1 Aug 2012 18:19:32 +0000 (11:19 -0700)]
rgw_xml: xml_handle_data() appends data string

Fixes: #2879.
xml_handle_data() appends data to the object instead of just
replacing it. Parsed data can arrive in pieces, specifically
when data is escaped.

Backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw: ETag is unquoted in multipart upload complete
Yehuda Sadeh [Wed, 1 Aug 2012 20:09:41 +0000 (13:09 -0700)]
rgw: ETag is unquoted in multipart upload complete

Fixes #2877.
Removing quotes from ETag before comparing it to what we
have when completing a multipart upload.

Backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoMonMap: return error on failure in build_initial
Josh Durgin [Wed, 8 Aug 2012 22:24:57 +0000 (15:24 -0700)]
MonMap: return error on failure in build_initial

If mon_host fails to parse, return an error instead of success.
This avoids failing later on an assert monmap.size() > 0 in the
monmap in MonClient.

Fixes: #2913
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoaddr_parsing: report correct error message
Josh Durgin [Wed, 8 Aug 2012 22:10:27 +0000 (15:10 -0700)]
addr_parsing: report correct error message

getaddrinfo uses its return code to report failures.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agomkcephfs: use default osd_data, _journal values
Sage Weil [Wed, 8 Aug 2012 21:01:53 +0000 (14:01 -0700)]
mkcephfs: use default osd_data, _journal values

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
13 years agomkcephfs: use new default keyring locations
Sage Weil [Wed, 8 Aug 2012 21:01:35 +0000 (14:01 -0700)]
mkcephfs: use new default keyring locations

The ceph-conf command only parses the conf; it does not apply default
config values.  This breaks mkcephfs if values are not specified in the
config.

Let ceph-osd create its own key, fix copying, and fix creation/copying for
the mds.

Fixes: #2845
Reported-by: Florian Haas <florian@hastexo.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
13 years agoosd: peering: detect when log source osd goes down
Sage Weil [Tue, 31 Jul 2012 21:01:57 +0000 (14:01 -0700)]
osd: peering: detect when log source osd goes down

The Peering state has a generic check based on the prior set osds that
will restart peering if one of them goes down (or one of the interesting
down ones comes up).  The GetLog state, however, can pull the log from
a peer that is not in the prior set if it got a notify from them (e.g., an
osd in an old interval that was down when the prior set was calculated).
If that osd goes down, we don't detect it and will block forward.

Fix by adding a simple check in GetLog for the newest_update_osd going
down.

(BTW GetMissing does not suffer from this problem because
peer_missing_requested is a subset of the prior set, so the Peering check
is sufficient.)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
13 years agorbd: fix off-by-one error in key name
Sylvain Munaut [Tue, 31 Jul 2012 18:55:56 +0000 (11:55 -0700)]
rbd: fix off-by-one error in key name

Fixes: #2846
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
13 years agosecret: return error on empty secret
Sylvain Munaut [Tue, 31 Jul 2012 18:54:29 +0000 (11:54 -0700)]
secret: return error on empty secret

Signed-off-by: Sylvain Munaut <tnt@246tNt.com>