Sage Weil [Thu, 2 May 2019 19:34:53 +0000 (14:34 -0500)]
osd: clean up osdmap sharing
- always use the Session::last_sent_epoch value, both for clients and osds
- get rid of the stl map<> of peer epochs
- consolidate all map sharing into a single maybe_share_map()
- optionally take a lower bound on the peer's epoch, for use when it is
available (e.g., when we are handling a message that specifies what
epoch the peer had when it sent the message)
- use const OSDMapRef& where possible
- drop osd->is_active() check, since we no longer have any dependency on
OSD[Service] state beyond our osdmap
The old callchain was convoluted, partly because it was needlessly
separated into several layers of helpers, and partly because the tracking
for clients and peer OSDs was totally different.
PeeringState: don't zero backfill target num_bytes on activation
834d3c19a774f1cc93903447d91d182776e12d18 preserves num_bytes
on backfill targets in order to estimate space required to complete
backill. However, from activation until backfill reservation,
info.stats.stats.sum.num_bytes is persisted to disk as 0 messing
up future intervals. Instead, preserve it in the info sent during
recovery and leave it alone in RequestBackfillPrio.
Additionally, it's possible for backfill to be preempted between
last_backfill=MAX being sent to the replica and Backfilled being
queued occuring. In that case, the stats get on reservation
and the replica ends up with invalid stats.
Samuel Just [Wed, 10 Apr 2019 02:29:05 +0000 (19:29 -0700)]
osd/: condense missing mutations for recovery/repair/errors
At a high level, this patch attempts to unify the various
sites at which some combination of
- mark object missing in one or more pg_missing_t
- mark object needs_recovery in missing_loc
- manipulate the locations map based on external information
occur. It seems to me that the pg_missing_t and missing_loc
should be in sync except for the mark_unfound_lost_revert
case and the case where we are about to do a backfill push.
This patch also cleans up repair_object. It sort of worked by accident
for non-ec non-primary bad peers. It didn't update missing_loc, so
needs_recovery() returns the wrong answer. However, is_unfound() does
as well, so ReplicatedBackend is nevertheless happy as the object would
be present on the primary. This patch updates the behavior to be
uniform as in the other force_obejct_missing cases.
Samuel Just [Sun, 21 Apr 2019 01:51:08 +0000 (18:51 -0700)]
test-erasure-eio: first eio may be fixed during recovery
The changes to the way EC/ReplicatedBackend communicate read
t showerrors had a side effect of making first eio on the object in
TEST_rados_get_subread_eio_shard_[01] repair itself depending
on the timing of the killed osd recovering. The test should
be improved to actually test that behavior at some point.
PeeringState was the only remaining caller -- both the interface in
PGBackend and the pass-through in PrimaryLogPG were actually unused.
Further, the WaitLocalRecoveryReserved user does not appear to actually
require the fresh-off-of-the-wire osdmap available from OSDService.
As such, moved the logic into OSDMap itself and simply used the
PG local osdmap.
Note, the above is a change in behavior that probably could use a second
opinion.
Samuel Just [Fri, 29 Mar 2019 18:06:26 +0000 (11:06 -0700)]
osd/: move Active purged_snaps handling back to PG
Note: this patch only sets dirty_info/dirty_big_info when
info.purged_snaps is actually changed. I *think* that's
right, but that part deserves particular attention during
review.
osd/: move io reservation machinery into PeeringState
This patch recasts the reservation and backoff interfaces
on PeeringListener in terms of events to queue rather than
explicit callbacks (PG handles implementing via callbacks).
sjust@redhat.com [Fri, 22 Mar 2019 23:19:18 +0000 (16:19 -0700)]
osd/: move most peering state to PeeringState
This patch moves the 40 something peering state variables over into
PeeringState while leaving references to them in PG. The following
patches will move over the users until all users are in PeeringState.
Then, the PG references will be removed. A subsequent patch will also
move the recovery_state member to be the last initialize and first
destructed.
Samuel Just [Thu, 21 Mar 2019 23:25:17 +0000 (16:25 -0700)]
PGStateUtils: remove PG*, move into PGStateHistory into PeeringState
I don't think there's any need to worry about the pg locking from
PGStateHistory. NamedState::begin/exit and dump_pgstate_history are the
only users, so the only calls should be under the peering event handler,
the asok command, or the PG constructor and destructor. The first two
already have the lock. The last should be safe as well as long as
the state machine states are constructed and destructed after and
before the PGStateHistory instance. As such, this patch removes most
of that state leaving the epoch generation as an interface
implemented by PG.
The snap trimming state machine was already excluded, so that this
patch leaves it disabled.
Samuel Just [Fri, 12 Apr 2019 18:08:54 +0000 (11:08 -0700)]
osd/: mechanically rename RecoveryMachine/State/Ctx to Peering*
I'm going to extract this logic and reuse it in crimson. Recovery* has
always been a confusing name as it implements neither log-based recovery
nor backfill. Rather, it's mainly the buisiness logic for agreeing on
an authoritative log and some ancillary things such as scrub/backfill
reservation.
$ for i in $(git grep -l 'RecoveryMachine'); do sed -i 's/RecoveryMachine/PeeringMachine/g' $i; done
$ for i in $(git grep -l 'RecoveryState'); do sed -i 's/RecoveryState/PeeringState/g' $i; done
$ for i in $(git grep -l 'RecoveryCtx'); do sed -i 's/RecoveryCtx/PeeringCtx/g' $i; done
common: Clang requires a default constructor, but it can be empty
Just do what the error messages ask for.
Error from Clang:
```
In file included from /home/jenkins/workspace/ceph-master/src/cls/rbd/cls_rbd.cc:28:
In file included from /home/jenkins/workspace/ceph-master/src/objclass/../include/types.h:21:
In file included from /home/jenkins/workspace/ceph-master/src/include/uuid.h:9:
In file included from /home/jenkins/workspace/ceph-master/src/include/encoding.h:17:
In file included from /usr/include/c++/v1/set:426:
In file included from /usr/include/c++/v1/__tree:16:
/usr/include/c++/v1/memory:2241:41: error: call to implicitly-deleted default constructor of '__compressed_pair_elem<ceph::BitVector<'\x02'>::NoInitAllocator, 1>'
: _Base1(std::forward<_Tp>(__t)), _Base2() {}
^
/usr/include/c++/v1/vector:437:7: note: in instantiation of function template specialization 'std::__1::__compressed_pair<unsigned int *, ceph::BitVector<'\x02'>::NoInitAllocator>::__compressed_pair<nullptr_t, true>' requested here
__end_cap_(nullptr)
^
/usr/include/c++/v1/vector:496:5: note: in instantiation of member function 'std::__1::__vector_base<unsigned int, ceph::BitVector<'\x02'>::NoInitAllocator>::__vector_base' requested here
vector() _NOEXCEPT_(is_nothrow_default_constructible<allocator_type>::value)
^
/home/jenkins/workspace/ceph-master/src/common/bit_vector.hpp:163:3: note: in instantiation of member function 'std::__1::vector<unsigned int, ceph::BitVector<'\x02'>::NoInitAllocator>::vector' requested here
BitVector();
^
/home/jenkins/workspace/ceph-master/src/cls/rbd/cls_rbd.cc:3289:16: note: in instantiation of member function 'ceph::BitVector<'\x02'>::BitVector' requested here
BitVector<2> object_map;
^
/usr/include/c++/v1/memory:2179:39: note: explicitly defaulted function was implicitly deleted here
_LIBCPP_INLINE_VISIBILITY constexpr __compressed_pair_elem() = default;
^
/usr/include/c++/v1/memory:2172:50: note: default constructor of '__compressed_pair_elem<ceph::BitVector<'\x02'>::NoInitAllocator, 1, true>' is implicitly deleted because base class 'ceph::BitVector<'\x02'>::NoInitAllocator' has no default constructor
struct __compressed_pair_elem<_Tp, _Idx, true> : private _Tp {
^
1 error generated.
```
Fixes: http://tracker.ceph.com/issues/39561 Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>