Kefu Chai [Fri, 15 May 2015 14:50:36 +0000 (22:50 +0800)]
mon: always reply mdsbeacon
the MDS (Beacon) is always expecting the reply for the mdsbeacon messages from
the lead mon, and it uses the delay as a metric for the laggy-ness of the
Beacon. when it comes to the MDSMonitor on a peon, it will remove the route
session at seeing a reply (route message) from leader, so a reply to
mdsbeacon will stop the peon from resending the mdsbeacon request to the
leader.
if the MDSMonitor re-forwards the unreplied requests after they are
outdated, there are chances that the requests reflecting old and even wrong
state of the MDSs mislead the lead monitor. for example, the MDSs which sent
the outdated messages could be dead.
Sage Weil [Wed, 18 Feb 2015 22:53:04 +0000 (14:53 -0800)]
osd,mon: explicitly specify OSD features in MOSDBoot
We are using the connection features to populate the features field in the
OSDMap, but this is the *intersection* of mon and osd features, not the
osd features. Fix this by explicitly specifying the features in
MOSDBoot.
Ken Dreyer [Wed, 10 Jun 2015 21:43:41 +0000 (15:43 -0600)]
ceph.spec.in: package mkcephfs on EL6
Commit efbca0465c2946e113771966df08cf7cf37b1196 added mkcephfs to the
RPM %files listing, but this /usr/sbin path is only correct for CentOS
7. In CentOS 6, the utility is present at /sbin/mkcephfs instead. This
causes rpmbuild to fail to build the tip of the firefly branch on EL6.
Adjust the RPM %files list so we properly package mkcephfs on both EL7
and EL6.
max_req_id was moved to RGWRados and changed to atomic64_t.
The same request id resulted in gc giving the same idtag to all objects
resulting in a leakage of rados objects. It only kept the last deleted object in
it's queue, the previous objects were never freed.
Fixes: 10295
Backport: Hammer, Firefly
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
(cherry picked from commit c262259)
Ken Dreyer [Mon, 18 May 2015 16:50:58 +0000 (10:50 -0600)]
debian: set rest-bench-dbg ceph-test-dbg dependencies
Debian's debug packages ought to depend on their respective binary
packages. This was the case for many of our ceph packages, but it was
not the case for ceph-test-dbg or rest-bench-dbg.
Add the dependencies on the relevant binary packages, pinned to
"= ${binary:Version}" per convention.
Yehuda Sadeh [Thu, 14 May 2015 00:05:22 +0000 (17:05 -0700)]
rgw: merge manifests correctly when there's prefix override
Fixes: #11622
Backport: hammer, firefly
Prefix override happens in a manifest when a rados object does not
conform to the generic prefix set on the manifest. When merging
manifests (specifically being used in multipart objects upload), we need
to check if the rule that we try to merge has a prefix that is the same
as the previous rule. Beforehand we checked if both had the same
override_prefix setting, but that might not apply as both manifests
might have different prefixes.
Loic Dachary [Fri, 15 May 2015 15:02:05 +0000 (17:02 +0200)]
Merge pull request #4556 from xinxinsh/wip-11429-firefly
OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmap
Samuel Just [Tue, 21 Apr 2015 06:45:57 +0000 (23:45 -0700)]
OSD: handle the case where we resurrected an old, deleted pg
Prior to giant, we would skip pgs in load_pgs which were not present in
the current osdmap. Those pgs would eventually refer to very old
osdmaps, which we no longer have causing the assertion failure in 11429
once the osd is finally upgraded to a version which does not skip the
pgs. Instead, if we do not have the map for the pg epoch, complain to
the osd log and skip the pg.
Ken Dreyer [Wed, 22 Apr 2015 22:36:42 +0000 (16:36 -0600)]
init-radosgw: run RGW as root
The ceph-radosgw service fails to start if the httpd package is not
installed. This is because the init.d file attempts to start the RGW
process with the "apache" UID. If a user is running civetweb, there is
no reason for the httpd or apache2 package to be present on the system.
Switch the init scripts to use "root" as is done on Ubuntu.
http://tracker.ceph.com/issues/11453 Refs: #11453
Reported-by: Vickey Singh <vickey.singh22693@gmail.com> Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 47339c5ac352d305e68a58f3d744c3ce0fd3a2ac)
The cherry-picked commit did not compile as-is, because the hobject_t
class in firefly lacks a get_hash() method, which was added in 6de83d4.
To get the patch to compile, I replaced i->second.get_hash() with
i->second.hash.
Mykola Golub [Tue, 3 Mar 2015 06:45:58 +0000 (08:45 +0200)]
osd: fix PG::all_unfound_are_queried_or_lost for non-existent osds
A common mistake upon osd loss is to remove the osd from the crush map
before marking the osd lost. This tends to make it so that the user
can no longer mark the osd lost to satisfy all_unfound_are_queried_or_lost.
The simple solution is for all_unfound_are_queried_or_lost to ignore
the osd if it does not exist.
Guang Yang [Mon, 29 Sep 2014 08:21:10 +0000 (08:21 +0000)]
PG::actingset should be used when checking the number of acting OSDs for a given PG. Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
(cherry picked from commit 19be358322be48fafa17b28054619a8b5e7d403b)
Conflicts:
src/osd/PG.cc PG::get_backfill_priority() doesn't exist in firefly
Variation in code related to no "undersized" state in firefly
Samuel Just [Thu, 26 Mar 2015 17:26:48 +0000 (10:26 -0700)]
ReplicatedPG::cancel_pull: requeue waiters as well
If we are in recovery_wait, we might not recover that object as part of
recover_primary for some time. Worse, if we are waiting on a backfill
which is blocked waiting on a copy_from on the missing object in
question, it can become a dead lock.
Loic Dachary [Wed, 18 Mar 2015 13:17:00 +0000 (14:17 +0100)]
osd: erasure-code-profile incremental rm before set
It is possible for an incremental change to have both a rm and a set for
a given erasure code profile. It only happens when a rm is followed by a
set. When a set is followed by a rm, the rm will remove the pending set
in the incremental change.
The logic is the same for pool create and pool delete.
We must apply the incremental erasure-code-profile removal before the
creation otherwise rm and set in the same proposal will ignore the set.
This fix is minimal. A better change would be that erasure-code-profile
set checks if there is a pending removal and wait_for_finished_proposal
before creating.
Samuel Just [Tue, 24 Mar 2015 22:14:34 +0000 (15:14 -0700)]
ReplicatedPG: trim backfill intervals based on peer's last_backfill_started
Otherwise, we fail to trim the peer's last_backfill_started and get bug 11199.
1) osd 4 backfills up to 31bccdb2/mira01213209-286/head (henceforth: foo)
2) Interval change happens
3) osd 0 now finds itself backfilling to 4 (lb=foo) and osd.5
(lb=b6670ba2/mira01213209-160/snapdir//1, henceforth: bar)
4) recover_backfill causes both 4 and 5 to scan forward, so 4 has an interval
starting at foo, 5 has an interval starting at bar.
5) Once those have come back, recover_backfill attempts to trim off the
last_backfill_started, but 4's interval starts after that, so foo remains in
osd 4's interval (this is the bug)
7) We serve a copyfrom on foo (sent to 4 as well).
8) We eventually get to foo in the backfilling. Normally, they would have the
same version, but of course we don't update osd.4's interval from the log since
it should not have received writes in that interval. Thus, we end up trying to
recover foo on osd.4 anyway.
9) But, an interval change happens between removing foo from osd.4 and
completing the recovery, leaving osd.4 without foo, but with lb >= foo
Samuel Just [Tue, 24 Mar 2015 17:48:02 +0000 (10:48 -0700)]
PG: set/clear CREATING in Primary state entry/exit
Previously, we did not actually set it when we got a pg creation message from
the mon. It would actually get set on the first start_peering_interval after
that point. If we don't get that far, but do send a stat update to the mon, we
can end up with 11197. Instead, let's just set it and clear it upon entry into
and exit from the Primary state.
Jason Dillaman [Mon, 16 Mar 2015 22:40:49 +0000 (18:40 -0400)]
librbd: snap_remove should ignore -ENOENT errors
If the attempt to deregister the snapshot from the parent
image fails with -ENOENT, ignore the error as it is safe
to assume that the child is not associated with the parent.
Yehuda Sadeh [Fri, 27 Mar 2015 23:32:48 +0000 (16:32 -0700)]
rgw: generate new tag for object when setting object attrs
Fixes: #11256
Backport: firefly, hammer
Beforehand we were reusing the object's tag, which is problematic as
this tag is used for bucket index updates, and we might be clobbering a
racing update (like object removal).
Samuel Just [Fri, 20 Mar 2015 22:28:15 +0000 (15:28 -0700)]
ReplicatedPG::promote_object: check scrubber and block if necessary
Otherwise, we might attempt to promote into an in-progress scrub
interval causing 11156. I would have added a return value to
promote_object(), but could not find an existing user which
cared to distinguish the cases, even with a null op passed.
All existing users are in maybe_handle_cache. The ones which
pass a null op are for promoting the object in parallel
with a proxy -- a case where not actually performing the promote
does not really matter.
David Zafman [Sat, 21 Mar 2015 00:48:01 +0000 (17:48 -0700)]
ceph-objectstore-tool: Use exit status 11 for incompatible import attempt
This is used so upgrade testing doesn't generate false failure. Fixes: #11139 Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 175aff8afe8215547ab57f8d8017ce8fdc0ff543)
Loic Dachary [Wed, 18 Mar 2015 23:32:39 +0000 (00:32 +0100)]
doc,tests: force checkout of submodules
When updating submodules, always checkout even if the HEAD is the
desired commit hash (update --force) to avoid the following:
* a directory gmock exists in hammer
* a submodule gmock replaces the directory gmock in master
* checkout master + submodule update : gmock/.git is created
* checkout hammer : the gmock directory still contains the .git from
master because it did not exist at the time and checkout won't
remove untracked directories
* checkout master + submodule update : git rev-parse HEAD is
at the desired commit although the content of the gmock directory
is from hammer
Guang Yang [Thu, 26 Feb 2015 08:13:12 +0000 (08:13 +0000)]
osd: fix negative degraded objects during backfilling
When there is deleting requests during backfilling, the reported number of degraded
objects could be negative, as the primary's num_objects is the latest (locally) but
the number for replicas might not reflect the deletings. A simple fix is to ignore
the negative subtracted value.