Sam Lang [Tue, 26 Feb 2013 18:22:47 +0000 (12:22 -0600)]
include/elist: Fix clear() to use pop_front()
elist<T>::clear() is calling remove(), which isn't a
method defined on elist<T> (it was never defined according
to git). Because elist is templated and no references
to clear() are ever made, the compiler matches remove(T) to the
remove(const char *) system call defined in stdio.h.
Once clear is invoked on an instance of elist<T>, we get the
compile error shown below.
The fix here is to use pop_front() instead of remove().
Compile error is:
In file included from ../../src/mds/CInode.h:22:0,
from ../../src/mds/CInode.cc:19:
../../src/include/elist.h: In instantiation of ‘void elist<T>::clear() [with T = cinode_backtrace_info_t*]’:
../../src/mds/CInode.cc:1129:20: required from here
../../src/include/elist.h:101:7: error: no matching function for call to ‘remove(cinode_backtrace_info_t*)’
../../src/include/elist.h:101:7: note: candidates are:
In file included from ../../src/mds/CInode.cc:17:0:
/usr/include/stdio.h:179:12: note: int remove(const char*)
/usr/include/stdio.h:179:12: note: no known conversion for argument 1 from ‘cinode_backtrace_info_t*’ to ‘const char*’
In file included from /usr/include/c++/4.7/algorithm:63:0,
from /usr/include/c++/4.7/backward/hashtable.h:65,
from /usr/include/c++/4.7/ext/hash_map:65,
from ../../src/include/encoding.h:292,
from ../../src/common/entity_name.h:22,
from ../../src/common/config.h:26,
from ../../src/mds/CInode.h:20,
from ../../src/mds/CInode.cc:19:
/usr/include/c++/4.7/bits/stl_algo.h:1117:5: note: template<class _FIter, class _Tp> _FIter std::remove(_FIter, _FIter, const _Tp&)
/usr/include/c++/4.7/bits/stl_algo.h:1117:5: note: template argument deduction/substitution failed:
In file included from ../../src/mds/CInode.h:22:0,
from ../../src/mds/CInode.cc:19:
../../src/include/elist.h:101:7: note: candidate expects 3 arguments, 1 provided
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Sam Lang [Tue, 5 Mar 2013 14:28:47 +0000 (08:28 -0600)]
mds: Use map for CInode pinrefs
Implements pin refs on the inode as a map instead of
a multiset, allowing individual ref counts to act as
real references with values that can be >1.
The pin refs are only used for debugging, but allowing
them to be >1 avoids the need for a separate state field
for things like DIRTYPARENT.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Sam Lang [Tue, 26 Feb 2013 00:51:19 +0000 (18:51 -0600)]
client: Ensure inode/dentries are ref counted
The MetaRequest holds onto inodes and dentries
for retrying unsafe requests, but those objects
might be removed from the cache (unlink for example)
causing the inode/dentry to be freed. Ensure that
the inode/dentry is never freed while the MetaRequest
holds onto it by putting/getting the refs using
set/get interfaces.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Samuel Just [Fri, 15 Mar 2013 22:13:46 +0000 (15:13 -0700)]
OSD: split temp collection as well
Otherwise, when we eventually remove the temp collection, there might be
objects in the temp collection which were independently pulled into the child
pg collection. Thus, removing the old stale parent link from its temp
collection also blasts the omap entries and snap mappings for the real child
object.
Backport: bobtail Fixes: #4452 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Gary Lowell [Fri, 15 Mar 2013 05:54:05 +0000 (22:54 -0700)]
ceph.spec.in: Additional clean-up on package removal
When removing the last instance of ceph, also remove the files
created by ceph during operation. These consist of the files
under /var/lib/ceph, /etc/ceph, and /var/log/ceph. Bug #4415.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Samuel Just [Fri, 15 Mar 2013 17:59:38 +0000 (10:59 -0700)]
FileJournal,Journal: detect some corrupt journal scenarios
When the checksum or footer are invalid, we will now try to
look at the next entry. If we find a valid entry, it is likely
that the journal is corrupt.
Samuel Just [Tue, 26 Feb 2013 01:31:12 +0000 (17:31 -0800)]
FileJournal: add committed_up_to to header
header_t::committed_up_to provides a lower bound for safetly committed
journal entries. If read_entry fails prior to committed_up_to, we
know we have a corrupt jorunal entry. Furthermore, if
journal_write_header_frequency is not 0, we will write out the
journal header once every journal_write_header_frequency
journal writes.
Samuel Just [Fri, 15 Mar 2013 01:52:02 +0000 (18:52 -0700)]
OSD: expand_pg_num after pg removes
Otherwise:
1) expand_pg_num removes a splitting pg entry
2) peering thread grabs pg lock and starts split
3) OSD::consume_map grabs pg lock and starts removal
At step 2), we run afoul of the assert(is_splitting)
check in split_pgs. This way, the would be splitting
pg is marked as removed prior to the splitting state
being updated.
Backport: bobtail Fixes: #4449 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Samuel Just [Fri, 15 Mar 2013 02:59:36 +0000 (19:59 -0700)]
PG: ignore non MISSING pg query in ReplicaActive
1) Replica sends notify
2) Prior to processing notify, primary queues query to replica
3) Primary processes notify and activates sending MOSDPGLog
to replica.
4) Primary does do_notifies at end of process_peering_events
and sends to Query.
5) Replica sees MOSDPGLog and activates
6) Replica sees Query and asserts.
In the above case, the Replica should simply ignore the old
Query.
Fixes: #4050
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 15 Mar 2013 04:05:07 +0000 (21:05 -0700)]
ceph-disk-activate: identify cluster .conf by fsid
Determine what cluster the disk belongs to by checking the fsid defined
in /etc/ceph/*.conf. Previously we hard-coded 'ceph'.
Note that this has the nice side-effect that if we have a disk with a
bad/different fsid, we now fail to activate it. Previously, we would
mount and start ceph-osd, but the daemon would fail to authenticate
because it was part of the wrong cluster.
Fixes: #3253 Signed-off-by: Sage Weil <sage@inktank.com>
Gary Lowell [Tue, 12 Mar 2013 23:59:42 +0000 (16:59 -0700)]
debian/control: Fix for moved file
The ceph-mds.conf file moced from the ceph package to the
ceph-mds package. Add replaces/breaks statements to the
control file to handle this on upgrade.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Sage Weil [Thu, 14 Mar 2013 23:18:26 +0000 (16:18 -0700)]
ceph-disk-activate: abort if target position is already mounted
If the target position is already a mount point, fail to move our mount
over to it. This usually indicates that a different osd.N from a
different cluster instances is in that position.
Sage Weil [Thu, 14 Mar 2013 19:33:08 +0000 (12:33 -0700)]
debian: add start ceph-mds-all on ceph-mds install
This ensures that when we then start individual mds instances, we can
stop ceph-mds-all and they will get stopped. We do the same already for
ceph-all.
Sage Weil [Thu, 14 Mar 2013 19:33:08 +0000 (12:33 -0700)]
debian: add start ceph-mds-all on ceph-mds install
This ensures that when we then start individual mds instances, we can
stop ceph-mds-all and they will get stopped. We do the same already for
ceph-all.
We run --mkfs with the osd disk mounted in a temporary location, so it is
necessary to explicitly pass in these paths.
If we want to support journals in a different location, we need to make
ceph-disk-prepare update the journal symlink accordingly.. not control it via
the config option.
David Zafman [Wed, 13 Mar 2013 03:49:25 +0000 (20:49 -0700)]
osd: data loss: low space handling
Add check whether to allow writing ops based on failsafe full percentage
Check for failsafe nearfull warning or full error message every heartbeat
Use clock to limit messages to every 30 secs (osd_op_complaint_time)
Feature: #4197
Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Mon, 4 Mar 2013 19:16:05 +0000 (11:16 -0800)]
OSD,PG: add upgrade procedure for snap_mapper
Also, sub_op_modify transactions currently carry the operations
for creating snap links in the shipped transaction. To handle
ops shipped by unenlightened osds, transactions can now be
tagged with a tolerate_collection_add_enoent flag.
Samuel Just [Tue, 5 Mar 2013 22:34:47 +0000 (14:34 -0800)]
osd/: Integrate SnapMapper with OSD
- SnapTrimmer now uses SnapMapper to get the next object to trim
- Entries for a snap are implicitely removed from SnapMapper when
the last object is trimmed, so no need for the adjust_local_snaps
logic.
- Scrub now compares the object_info snaps set on the object attr
with the version stored in the SnapMapper.
Samuel Just [Thu, 7 Mar 2013 20:53:51 +0000 (12:53 -0800)]
PG: check_recovery_sources must happen even if not active
missing_loc/missing_loc_sources also must be cleaned up
if a peer goes down during peering:
1) pg is in GetInfo, acting is [3,1]
2) we find object A on osd [0] in GetInfo
3) 0 goes down, no new peering interval since it is neither up nor
acting, but peer_missing[0] is removed.
4) pg goes active and try to pull A from 0 since missing_loc did not get
cleaned up.
Backport: bobtail Fixes: #4371 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>