Samuel Just [Fri, 15 Mar 2013 01:52:02 +0000 (18:52 -0700)]
OSD: expand_pg_num after pg removes
Otherwise:
1) expand_pg_num removes a splitting pg entry
2) peering thread grabs pg lock and starts split
3) OSD::consume_map grabs pg lock and starts removal
At step 2), we run afoul of the assert(is_splitting)
check in split_pgs. This way, the would be splitting
pg is marked as removed prior to the splitting state
being updated.
Backport: bobtail Fixes: #4449 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Samuel Just [Fri, 15 Mar 2013 02:59:36 +0000 (19:59 -0700)]
PG: ignore non MISSING pg query in ReplicaActive
1) Replica sends notify
2) Prior to processing notify, primary queues query to replica
3) Primary processes notify and activates sending MOSDPGLog
to replica.
4) Primary does do_notifies at end of process_peering_events
and sends to Query.
5) Replica sees MOSDPGLog and activates
6) Replica sees Query and asserts.
In the above case, the Replica should simply ignore the old
Query.
Fixes: #4050
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 15 Mar 2013 04:05:07 +0000 (21:05 -0700)]
ceph-disk-activate: identify cluster .conf by fsid
Determine what cluster the disk belongs to by checking the fsid defined
in /etc/ceph/*.conf. Previously we hard-coded 'ceph'.
Note that this has the nice side-effect that if we have a disk with a
bad/different fsid, we now fail to activate it. Previously, we would
mount and start ceph-osd, but the daemon would fail to authenticate
because it was part of the wrong cluster.
Fixes: #3253 Signed-off-by: Sage Weil <sage@inktank.com>
Gary Lowell [Tue, 12 Mar 2013 23:59:42 +0000 (16:59 -0700)]
debian/control: Fix for moved file
The ceph-mds.conf file moced from the ceph package to the
ceph-mds package. Add replaces/breaks statements to the
control file to handle this on upgrade.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Sage Weil [Thu, 14 Mar 2013 23:18:26 +0000 (16:18 -0700)]
ceph-disk-activate: abort if target position is already mounted
If the target position is already a mount point, fail to move our mount
over to it. This usually indicates that a different osd.N from a
different cluster instances is in that position.
Sage Weil [Thu, 14 Mar 2013 19:33:08 +0000 (12:33 -0700)]
debian: add start ceph-mds-all on ceph-mds install
This ensures that when we then start individual mds instances, we can
stop ceph-mds-all and they will get stopped. We do the same already for
ceph-all.
Sage Weil [Thu, 14 Mar 2013 19:33:08 +0000 (12:33 -0700)]
debian: add start ceph-mds-all on ceph-mds install
This ensures that when we then start individual mds instances, we can
stop ceph-mds-all and they will get stopped. We do the same already for
ceph-all.
We run --mkfs with the osd disk mounted in a temporary location, so it is
necessary to explicitly pass in these paths.
If we want to support journals in a different location, we need to make
ceph-disk-prepare update the journal symlink accordingly.. not control it via
the config option.
David Zafman [Wed, 13 Mar 2013 03:49:25 +0000 (20:49 -0700)]
osd: data loss: low space handling
Add check whether to allow writing ops based on failsafe full percentage
Check for failsafe nearfull warning or full error message every heartbeat
Use clock to limit messages to every 30 secs (osd_op_complaint_time)
Feature: #4197
Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Mon, 4 Mar 2013 19:16:05 +0000 (11:16 -0800)]
OSD,PG: add upgrade procedure for snap_mapper
Also, sub_op_modify transactions currently carry the operations
for creating snap links in the shipped transaction. To handle
ops shipped by unenlightened osds, transactions can now be
tagged with a tolerate_collection_add_enoent flag.
Samuel Just [Tue, 5 Mar 2013 22:34:47 +0000 (14:34 -0800)]
osd/: Integrate SnapMapper with OSD
- SnapTrimmer now uses SnapMapper to get the next object to trim
- Entries for a snap are implicitely removed from SnapMapper when
the last object is trimmed, so no need for the adjust_local_snaps
logic.
- Scrub now compares the object_info snaps set on the object attr
with the version stored in the SnapMapper.
Samuel Just [Thu, 7 Mar 2013 20:53:51 +0000 (12:53 -0800)]
PG: check_recovery_sources must happen even if not active
missing_loc/missing_loc_sources also must be cleaned up
if a peer goes down during peering:
1) pg is in GetInfo, acting is [3,1]
2) we find object A on osd [0] in GetInfo
3) 0 goes down, no new peering interval since it is neither up nor
acting, but peer_missing[0] is removed.
4) pg goes active and try to pull A from 0 since missing_loc did not get
cleaned up.
Backport: bobtail Fixes: #4371 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Sage Weil [Wed, 13 Mar 2013 02:44:20 +0000 (19:44 -0700)]
mds: mark con for closed session disposable
If there is a fault while delivering the message, close the con. This will
clean up the Session state from memory. If the client doesn't get the
CLOSED message, they will reconnect (from their perspective, it is still
a lossless connection) and get a remote_reset event telling them that the
session is gone. The client code already handles this case properly.
Note that way back in 4ac45200f10e0409121948cea5226ca9e23bb5fb we removed
this because the client would reuse the same connection when it reopened
the session. Now the client never does that; it will mark_down the con
as soon as it is closed and open a new one for a new session... which means
the MDS will get a remote_reset and close out the old session.
Sage Weil [Wed, 13 Mar 2013 23:06:02 +0000 (16:06 -0700)]
client: validate/lookup mds session in each message handler
For every message handler, look up the MetaSession by int mds and verify
that the Connection* matches properly. If so, proceed; otherwise, discard
the message.
In the future, we probably want to link the MetaSession to the Connection's
priv field, but that can come later.
Clean up a bunch of submethods that take int mds while we're here.