Loic Dachary [Wed, 6 May 2015 18:14:37 +0000 (20:14 +0200)]
tests: ceph-helpers kill_daemons fails when kill fails
Instead of silently leaving the daemons running, it returns failure so
the caller can decide what to do with this situation. The timeout is
also extended to minutes instead of seconds to gracefully handle the
rare situations when a machine is extra slow for some reason.
Ken Dreyer [Thu, 30 Apr 2015 21:53:22 +0000 (15:53 -0600)]
packaging: mv ceph-objectstore-tool to main ceph pkg
This change ensures that the ceph-objectstore-tool utility is present on
all OSDs. This makes it easier for users to run this tool to do manual
debugging/recovery in some scenarios.
http://tracker.ceph.com/issues/11376 Refs: #11376
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 61cf5da0b51e2d9578c7b4bca85184317e30f4ca)
Conflicts:
debian/control
because file layout changes from ceph-test and ceph << 0.94.1-46
Loic Dachary [Thu, 7 May 2015 17:03:16 +0000 (19:03 +0200)]
Merge pull request #4559 from dachary/wip-11429-hammer
OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmap
Samuel Just [Tue, 21 Apr 2015 06:45:57 +0000 (23:45 -0700)]
OSD: handle the case where we resurrected an old, deleted pg
Prior to giant, we would skip pgs in load_pgs which were not present in
the current osdmap. Those pgs would eventually refer to very old
osdmaps, which we no longer have causing the assertion failure in 11429
once the osd is finally upgraded to a version which does not skip the
pgs. Instead, if we do not have the map for the pg epoch, complain to
the osd log and skip the pg.
Dmytro Iurchenko [Mon, 16 Feb 2015 16:47:59 +0000 (18:47 +0200)]
rgw: Swift API. Complement the response to "show container details"
OpenStack Object Storage API v1 states that X-Container-Object-Count, X-Container-Bytes-Used and user-defined metadata headers should be included in a response.
Owen Synge [Tue, 17 Mar 2015 14:41:33 +0000 (15:41 +0100)]
Fix "disk zap" sgdisk invocation
Fixes #11143
If the metadata on the disk is truly invalid, sgdisk would fail to zero
it in one go, because --mbrtogpt apparently tried to operate on the
metadata it read before executing --zap-all.
Splitting this up into two separate invocations to first zap everything
and then clear it properly fixes this issue.
Based on patch by Lars Marowsky-Bree <lmb@suse.com> in ceph-deploy.
Created by Vincent Untz <vuntz@suse.com>
Yehuda Sadeh [Fri, 27 Mar 2015 23:32:48 +0000 (16:32 -0700)]
rgw: generate new tag for object when setting object attrs
Fixes: #11256
Backport: firefly, hammer
Beforehand we were reusing the object's tag, which is problematic as
this tag is used for bucket index updates, and we might be clobbering a
racing update (like object removal).
Objects that start with underscore need to have an object locator,
this is due to an old behavior that we need to retain. Some objects
might have been created without the locator. This tool creates a new
rados object with the appropriate locator.
max_req_id was moved to RGWRados and changed to atomic64_t.
The same request id resulted in gc giving the same idtag to all objects
resulting in a leakage of rados objects. It only kept the last deleted object in
it's queue, the previous objects were never freed.
Boris Ranto [Mon, 13 Apr 2015 13:07:03 +0000 (15:07 +0200)]
Rework mds/Makefile.am to support a dencoder client build
The patch adds all the mds sources to DENCODER_SOURCES to allow a
dencoder client build. The patch also splits the Makefile.am file to
better accomodate the change.
Haomai Wang [Fri, 17 Apr 2015 14:07:00 +0000 (22:07 +0800)]
Fix clear_pipe after reaping progress
In pipe.cc:1353 we stop this connection and we will let reader and write threads stop. If now reader and writer quit ASAP and we call queue_reap to trigger the reap progress. Now we haven't call "connection_state->clear_pipe(this)" in pipe.cc:1379, so we may assert failure here.
Guang Yang [Fri, 3 Apr 2015 12:27:04 +0000 (12:27 +0000)]
rgw : Issue AIO for next chunk first before flush the (cached) data.
When handling GET request for large object (with multiple chunks), currently it will first flush the
cached data, and then issue AIO request for next chunk, this has the potential issue to make the retriving
from OSD and sending to client serialized. This patch switch the two operations.
Dencoder is built if ENABLE_CLIENT is set. However, the rgw/Makefile.am
populated DENCODER_SOURCES only if WITH_RADOSGW was set. The patch fixes
this and populates DENCODER_SOURES if ENABLE_CLIENT is set.
Loic Dachary [Sun, 8 Mar 2015 14:15:35 +0000 (15:15 +0100)]
ceph-disk: more robust parted output parser
In some cases, depending on the implementation or the operating system,
parted --machine -- /dev/sdh print
may contain empty lines. The current parsing code is fragile and highly
depends on output details. Replace it with code that basically does the
same sanity checks (output not empty, existence of units, existence of
the dev entry) but handles the entire output instead of checking line by
line.
Jianpeng Ma [Fri, 6 Mar 2015 03:26:31 +0000 (11:26 +0800)]
osdc: add epoch_t last_force_resend in Op/LingerOp.
Using this field record the pg_poo_t::last_force_op_resend to avoid op
endless when osd reply with redirect.
Fixes: #11026 Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit def4fc4ae51174ae92ac1fb606427f4f6f00743e)
Jason Dillaman [Tue, 7 Apr 2015 19:39:13 +0000 (15:39 -0400)]
librbd: moved snap_create header update notification to initiator
When handling a proxied snap_create operation, the client which
invoked the snap_create should send the header update notification
to avoid a possible race condition where snap_create completes but
the client doesn't see the new snapshot (since it didn't yet receive
the notification).
Jason Dillaman [Wed, 22 Apr 2015 15:27:35 +0000 (11:27 -0400)]
librbd: updated cache max objects calculation
The previous calculation was based upon the image's object size.
Since the cache stores smaller bufferheads, the object size is not
a good indicator of cache usage and was resulting in objects being
evicted from the cache too often. Instead, base the max number of
objects on the memory load required to store the extra metadata
for the objects.
Jason Dillaman [Fri, 13 Mar 2015 22:08:47 +0000 (18:08 -0400)]
librados_test_stub: AIO operation callbacks should be via Finisher
librados will execute all AIO callbacks via a single finisher to
prevent blocking the Objecter. Reproduce this behavior to avoid
deadlocks that only exist when using the test stub.
Ken Dreyer [Wed, 22 Apr 2015 22:36:42 +0000 (16:36 -0600)]
init-radosgw: run RGW as root
The ceph-radosgw service fails to start if the httpd package is not
installed. This is because the init.d file attempts to start the RGW
process with the "apache" UID. If a user is running civetweb, there is
no reason for the httpd or apache2 package to be present on the system.
Switch the init scripts to use "root" as is done on Ubuntu.
http://tracker.ceph.com/issues/11453 Refs: #11453
Reported-by: Vickey Singh <vickey.singh22693@gmail.com> Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 47339c5ac352d305e68a58f3d744c3ce0fd3a2ac)
Haomai Wang [Sun, 22 Mar 2015 15:59:19 +0000 (23:59 +0800)]
Fix ceph_test_async_driver failed
This test will create 10000 sockets which will failed because of limited system fd. Actually we only need to create several hundreds sockets and it's enough to get the test's goal.
Cherry picking the Hammer release notes cannot be done cleanly, they are
copy/pasted instead. This will allow cherry-picking the release notes
for the next point releases. It should be undisturbed by the release
notes for other point releases because they modify parts of the file
that will not generate cherry-pick conflicts.
Sage Weil [Fri, 10 Apr 2015 15:43:45 +0000 (08:43 -0700)]
crush: fix has_v4_buckets()
alg, not type!
This bug made us incorrectly think we were using v4 features when user type
5 was being used. That's currently 'rack' with recent crush maps, but
was other types for clusters that were created with older versions. This
is clearly problematic as it will lock out non-hammer clients incorrectly,
breaking deployments on upgrade.
Guang Yang [Thu, 26 Feb 2015 08:13:12 +0000 (08:13 +0000)]
osd: fix negative degraded objects during backfilling
When there is deleting requests during backfilling, the reported number of degraded
objects could be negative, as the primary's num_objects is the latest (locally) but
the number for replicas might not reflect the deletings. A simple fix is to ignore
the negative subtracted value.
This can be done better in a separate script, which puts these in
CEPH_EXTRA_CONFIGURE_ARGS. In particular, this lets us enable
lttng for gitbuilder builds, but not release builds.
Jason Dillaman [Mon, 16 Mar 2015 22:40:49 +0000 (18:40 -0400)]
librbd: snap_remove should ignore -ENOENT errors
If the attempt to deregister the snapshot from the parent
image fails with -ENOENT, ignore the error as it is safe
to assume that the child is not associated with the parent.
Samuel Just [Thu, 26 Mar 2015 17:26:48 +0000 (10:26 -0700)]
ReplicatedPG::cancel_pull: requeue waiters as well
If we are in recovery_wait, we might not recover that object as part of
recover_primary for some time. Worse, if we are waiting on a backfill
which is blocked waiting on a copy_from on the missing object in
question, it can become a dead lock.
Fixes: 11244
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Sage Weil [Fri, 27 Mar 2015 22:35:21 +0000 (15:35 -0700)]
common: send cluster log messages to 'cluster' channel by default
The CLOG_CHANNEL_DEFAULT constant was being abused for two purposes:
- the default channel to log messages to
- the name of the config option key in the key/value pair string that is
used for the default option, e.g. "default=true foo=false bar=false"
Fix this by making the config option key CLOG_CONFIG_DEFAULT_KEY and
replacing throughout, and changing CLOG_CHANNEL_DEFAULT to "cluster" (as
it should be and has been historically).
Fixes: #11177 Signed-off-by: Sage Weil <sage@redhat.com>