Jason Dillaman [Mon, 27 Apr 2015 07:42:24 +0000 (03:42 -0400)]
librbd: update ref count when queueing AioCompletion
If the client releases the AioCompletion while librbd is waiting
to acquire the exclusive lock, the memory associated with the
completion will be freed too early.
Yehuda Sadeh [Fri, 27 Mar 2015 23:32:48 +0000 (16:32 -0700)]
rgw: generate new tag for object when setting object attrs
Fixes: #11256
Backport: firefly, hammer
Beforehand we were reusing the object's tag, which is problematic as
this tag is used for bucket index updates, and we might be clobbering a
racing update (like object removal).
Objects that start with underscore need to have an object locator,
this is due to an old behavior that we need to retain. Some objects
might have been created without the locator. This tool creates a new
rados object with the appropriate locator.
max_req_id was moved to RGWRados and changed to atomic64_t.
The same request id resulted in gc giving the same idtag to all objects
resulting in a leakage of rados objects. It only kept the last deleted object in
it's queue, the previous objects were never freed.
Boris Ranto [Mon, 13 Apr 2015 13:07:03 +0000 (15:07 +0200)]
Rework mds/Makefile.am to support a dencoder client build
The patch adds all the mds sources to DENCODER_SOURCES to allow a
dencoder client build. The patch also splits the Makefile.am file to
better accomodate the change.
Haomai Wang [Fri, 17 Apr 2015 14:07:00 +0000 (22:07 +0800)]
Fix clear_pipe after reaping progress
In pipe.cc:1353 we stop this connection and we will let reader and write threads stop. If now reader and writer quit ASAP and we call queue_reap to trigger the reap progress. Now we haven't call "connection_state->clear_pipe(this)" in pipe.cc:1379, so we may assert failure here.
Guang Yang [Fri, 3 Apr 2015 12:27:04 +0000 (12:27 +0000)]
rgw : Issue AIO for next chunk first before flush the (cached) data.
When handling GET request for large object (with multiple chunks), currently it will first flush the
cached data, and then issue AIO request for next chunk, this has the potential issue to make the retriving
from OSD and sending to client serialized. This patch switch the two operations.
Dencoder is built if ENABLE_CLIENT is set. However, the rgw/Makefile.am
populated DENCODER_SOURCES only if WITH_RADOSGW was set. The patch fixes
this and populates DENCODER_SOURES if ENABLE_CLIENT is set.
Loic Dachary [Sun, 8 Mar 2015 14:15:35 +0000 (15:15 +0100)]
ceph-disk: more robust parted output parser
In some cases, depending on the implementation or the operating system,
parted --machine -- /dev/sdh print
may contain empty lines. The current parsing code is fragile and highly
depends on output details. Replace it with code that basically does the
same sanity checks (output not empty, existence of units, existence of
the dev entry) but handles the entire output instead of checking line by
line.
Jason Dillaman [Tue, 7 Apr 2015 19:39:13 +0000 (15:39 -0400)]
librbd: moved snap_create header update notification to initiator
When handling a proxied snap_create operation, the client which
invoked the snap_create should send the header update notification
to avoid a possible race condition where snap_create completes but
the client doesn't see the new snapshot (since it didn't yet receive
the notification).
Jason Dillaman [Wed, 22 Apr 2015 15:27:35 +0000 (11:27 -0400)]
librbd: updated cache max objects calculation
The previous calculation was based upon the image's object size.
Since the cache stores smaller bufferheads, the object size is not
a good indicator of cache usage and was resulting in objects being
evicted from the cache too often. Instead, base the max number of
objects on the memory load required to store the extra metadata
for the objects.
Jason Dillaman [Fri, 13 Mar 2015 22:08:47 +0000 (18:08 -0400)]
librados_test_stub: AIO operation callbacks should be via Finisher
librados will execute all AIO callbacks via a single finisher to
prevent blocking the Objecter. Reproduce this behavior to avoid
deadlocks that only exist when using the test stub.
Ken Dreyer [Wed, 22 Apr 2015 22:36:42 +0000 (16:36 -0600)]
init-radosgw: run RGW as root
The ceph-radosgw service fails to start if the httpd package is not
installed. This is because the init.d file attempts to start the RGW
process with the "apache" UID. If a user is running civetweb, there is
no reason for the httpd or apache2 package to be present on the system.
Switch the init scripts to use "root" as is done on Ubuntu.
http://tracker.ceph.com/issues/11453 Refs: #11453
Reported-by: Vickey Singh <vickey.singh22693@gmail.com> Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 47339c5ac352d305e68a58f3d744c3ce0fd3a2ac)
Haomai Wang [Sun, 22 Mar 2015 15:59:19 +0000 (23:59 +0800)]
Fix ceph_test_async_driver failed
This test will create 10000 sockets which will failed because of limited system fd. Actually we only need to create several hundreds sockets and it's enough to get the test's goal.
Cherry picking the Hammer release notes cannot be done cleanly, they are
copy/pasted instead. This will allow cherry-picking the release notes
for the next point releases. It should be undisturbed by the release
notes for other point releases because they modify parts of the file
that will not generate cherry-pick conflicts.
Sage Weil [Fri, 10 Apr 2015 15:43:45 +0000 (08:43 -0700)]
crush: fix has_v4_buckets()
alg, not type!
This bug made us incorrectly think we were using v4 features when user type
5 was being used. That's currently 'rack' with recent crush maps, but
was other types for clusters that were created with older versions. This
is clearly problematic as it will lock out non-hammer clients incorrectly,
breaking deployments on upgrade.
Guang Yang [Thu, 26 Feb 2015 08:13:12 +0000 (08:13 +0000)]
osd: fix negative degraded objects during backfilling
When there is deleting requests during backfilling, the reported number of degraded
objects could be negative, as the primary's num_objects is the latest (locally) but
the number for replicas might not reflect the deletings. A simple fix is to ignore
the negative subtracted value.
This can be done better in a separate script, which puts these in
CEPH_EXTRA_CONFIGURE_ARGS. In particular, this lets us enable
lttng for gitbuilder builds, but not release builds.
Jason Dillaman [Mon, 16 Mar 2015 22:40:49 +0000 (18:40 -0400)]
librbd: snap_remove should ignore -ENOENT errors
If the attempt to deregister the snapshot from the parent
image fails with -ENOENT, ignore the error as it is safe
to assume that the child is not associated with the parent.
Samuel Just [Thu, 26 Mar 2015 17:26:48 +0000 (10:26 -0700)]
ReplicatedPG::cancel_pull: requeue waiters as well
If we are in recovery_wait, we might not recover that object as part of
recover_primary for some time. Worse, if we are waiting on a backfill
which is blocked waiting on a copy_from on the missing object in
question, it can become a dead lock.
Fixes: 11244
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Sage Weil [Fri, 27 Mar 2015 22:35:21 +0000 (15:35 -0700)]
common: send cluster log messages to 'cluster' channel by default
The CLOG_CHANNEL_DEFAULT constant was being abused for two purposes:
- the default channel to log messages to
- the name of the config option key in the key/value pair string that is
used for the default option, e.g. "default=true foo=false bar=false"
Fix this by making the config option key CLOG_CONFIG_DEFAULT_KEY and
replacing throughout, and changing CLOG_CHANNEL_DEFAULT to "cluster" (as
it should be and has been historically).
Fixes: #11177 Signed-off-by: Sage Weil <sage@redhat.com>
Samuel Just [Tue, 24 Mar 2015 17:48:02 +0000 (10:48 -0700)]
PG: set/clear CREATING in Primary state entry/exit
Previously, we did not actually set it when we got a pg creation message from
the mon. It would actually get set on the first start_peering_interval after
that point. If we don't get that far, but do send a stat update to the mon, we
can end up with 11197. Instead, let's just set it and clear it upon entry into
and exit from the Primary state.
Fixes: 11197 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 24 Mar 2015 22:14:34 +0000 (15:14 -0700)]
ReplicatedPG: trim backfill intervals based on peer's last_backfill_started
Otherwise, we fail to trim the peer's last_backfill_started and get bug 11199.
1) osd 4 backfills up to 31bccdb2/mira01213209-286/head (henceforth: foo)
2) Interval change happens
3) osd 0 now finds itself backfilling to 4 (lb=foo) and osd.5
(lb=b6670ba2/mira01213209-160/snapdir//1, henceforth: bar)
4) recover_backfill causes both 4 and 5 to scan forward, so 4 has an interval
starting at foo, 5 has an interval starting at bar.
5) Once those have come back, recover_backfill attempts to trim off the
last_backfill_started, but 4's interval starts after that, so foo remains in
osd 4's interval (this is the bug)
7) We serve a copyfrom on foo (sent to 4 as well).
8) We eventually get to foo in the backfilling. Normally, they would have the
same version, but of course we don't update osd.4's interval from the log since
it should not have received writes in that interval. Thus, we end up trying to
recover foo on osd.4 anyway.
9) But, an interval change happens between removing foo from osd.4 and
completing the recovery, leaving osd.4 without foo, but with lb >= foo
Fixes: #11199
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Fri, 20 Mar 2015 22:28:15 +0000 (15:28 -0700)]
ReplicatedPG::promote_object: check scrubber and block if necessary
Otherwise, we might attempt to promote into an in-progress scrub
interval causing 11156. I would have added a return value to
promote_object(), but could not find an existing user which
cared to distinguish the cases, even with a null op passed.
All existing users are in maybe_handle_cache. The ones which
pass a null op are for promoting the object in parallel
with a proxy -- a case where not actually performing the promote
does not really matter.
Fixes: #11156 Signed-off-by: Samuel Just <sjust@redhat.com>
Currently, this method also returns true if the object is backfilling.
This commit was reverted earlier in the branch in order to make the
other reverts clean. It's actually a nice rename though, so I'm
re-cherry-picking it.
Signed-off-by: Samuel Just <sjust@redhat.com>
Conflicts:
src/osd/ReplicatedPG.cc